Comparing Comparisons: Document Clustering Evaluation Using Two Manual Classifications

نویسندگان

  • Magnus Rosell
  • Viggo Kann
چکیده

“Describe your occupation in a few words”, is a question answered by 44 000 Swedish twins. Each respondent was then manually categorized according to two established occupation classification systems. Would a clustering algorithm have produced satisfactory results? Usually, this question cannot be answered. The existing quality measures will tell us how much the algorithmic clustering deviates from the manual classification, not if this is an acceptable deviation. But in our situation, with two different manual classifications (in classification systems called AMSYK and YK80), we can indeed construct such quality measures. If the algorithmic result differs no more from the manual classifications than these differ from each other (comparing the comparisons) we have an indication of its being useful. Further, we use the kappa coefficient as a clustering quality measure. Using one manual classification as a coding scheme we assess the agreement of a clustering and the other. After applying both these novel evaluation methods we conclude that our clusterings are useful.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of a Walking Tractor Drawn Peanut Harvester and Comparing It With Manual Harvesting

The object of this study was evaluation of a walking tractor drawn peanut harvester at different conditions of soil moisture content and forward speed and comparing it with manual harvesting. The evaluation factors for peanut harvester were two levels of soil moisture content and three levels of forward speed. The results revealed that the effect of soil moisture content was only significant on...

متن کامل

Evaluation of a Walking Tractor Drawn Peanut Harvester and Comparing It With Manual Harvesting

The object of this study was evaluation of a walking tractor drawn peanut harvester at different conditions of soil moisture content and forward speed and comparing it with manual harvesting. The evaluation factors for peanut harvester were two levels of soil moisture content and three levels of forward speed. The results revealed that the effect of soil moisture content was only significant on...

متن کامل

A Cross-Comparison Of Two Clustering Methods

Many Natural Language Processing applications require semantic knowledge about topics in order to be possible or to be efficient. So we developed a system, SEGAPSITH, that acquires it automatically from text segments by using an unsupervised and incremental clustering method. In such an approach, an important problem consists of the validation of the learned classes. To do that, we applied anot...

متن کامل

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Language Independent Evaluation of Translation Style and Consistency: Comparing Human and Machine Translations of Camus' Novel "The Stranger"

We present quantitative and qualitative results of automatic and manual comparisons of translations of the originally French novel “The Stranger” (French: L’Étranger). We provide a novel approach to evaluating translation performance across languages without the need for reference translations or comparable corpora. Our approach examines the consistency of the translation of various document le...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004